Unsupervised models

Unsupervised models

  • Clustering

  • Scaling methods

Clustering

  • Clustering algorithms take a set of inputs and attempt to identify some latent “groups” in the data.

  • These data are assumed to be “unlabeled”: we don’t have specific groups in mind, or at least we haven’t labeled them beforehand.

Clustering: K-means

Clustering: K-means

  • Goal: find the set of “K” group assignments that minimizes the “within-cluster sum of squares”

  • K is any number between 1 and the sample size, and the researcher chooses it.

Clustering: K-means

  • After mean-centering and scaling the data, we’ll eyeball a division for K=2 groups here

Clustering: K-means

  • The “centroid” of each cluster is located at the mean value on each dimension for each cluster.

Clustering: K-means

  • The WCSS is just the sum of euclidean distances between the centroid and each point in the cluster.

Clustering: K-means algorithm

  • What about K = 3, or data with more than 2 features?

  • The k-means algorithm finds optimal clusters automatically.

Clustering: K-means algorithm

  1. Initialize some random centroids.

Clustering: K-means algorithm

  1. Initialize some random centroids.

  2. Assign each point to its nearest centroid.

Clustering: K-means algorithm

  1. Initialize some random centroids.

  2. Assign each point to its nearest centroid.

  3. Calculate a new centroid.

Clustering: K-means algorithm

  1. Initialize some random centroids.

  2. Assign each point to its nearest centroid.

  3. Calculate a new centroid.

  4. Return to step 2.

Clustering: K-means algorithm

The within-cluster sum of squares should shrink until it converges on a minimum

Clustering: K-means algorithm

  • Incidentally, the clusters here correspond to known categories: they’re three different species of penguin.

Clustering: K-means algorithm

  • We picked k=3 and two features for the sake of simplifying the display, but usually you’ll use more features and more clusters.

  • There are some heuristics for choose an optimum value of “K”, but it’s often partly a judgement call.

Try it out

  • Choose some features from the ches data set and then use kmeans to perform K-means clustering on those components.
  • Add the cluster assignments to the CHES data frame, and try to identify what groups - if any - the algorithm is picking up on.
show code to load CHES data
library(tidyverse)
library(readr)
ches<-read_csv('https://www.chesdata.eu/s/CHES_2024_final_v2.csv')
labels<-c("Radical Right",
          "Conservatives",
          "Liberal", 
          "Christian-Democratic",
          "Socialist",
          "Radical Left",
          "Green", 
          "Regionalist", 
          "No family",
          "Confessional",
          "Agrarian/Center")

ches$family<-factor(ches$family, labels=labels)
country_levels<-c(1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 16, 20, 21, 22, 
          23, 24, 25, 26, 27, 28, 29, 31, 34, 35, 36, 37, 38, 40, 45)
country_labels<-c("Belgium", "Denmark", "Germany", "Greece", "Spain", "France", 
          "Ireland", "Italy", "Netherlands",  "United Kingdom", "Portugal", 
          "Austria", "Finland", "Sweden", "Bulgaria", "Czech Republic",  
          "Estonia", "Hungary", "Latvia", "Lithuania","Poland", "Romania", 
          "Slovakia", "Slovenia", "Croatia", "Turkey", "Norway", "Switzerland", 
          "Malta", "Luxembourg", "Cyprus", "Iceland")

ches$country<-factor(ches$country, levels=country_levels, labels=country_labels)

Choosing the number of means: the elbow method

  • A heuristic method for determining the appropriate number of clusters is to check for a point of diminishing returns on different values of K.

  • The WCSS value will go down as K increases, but improvements will usually trail off rapidly at a certain point.

Dimensionality reduction: PCA

  • Dimensionality reduction techniques take a set of variables or relationships and simplify or summarize them in a smaller number of dimensions.

  • Useful for:

    • Visualizing high-dimensional data.
    • Improving supervised model speed or performance.
    • Identifying latent characteristics or factors.

Dimensionality reduction: PCA

Data from the 2024 Chapel Hill Expert Survey.

Dimensionality reduction: PCA

  • We could probably find combinations of variables that would sacrifice very little information while simplifying our model.

Dimensionality reduction: PCA

Principal components analysis takes a matrix and spits out a new one of equal size where:

  • each column is orthogonal (uncorrelated) with the others
  • columns are ordered by importance, with the columns that “explain” the most coming first.

Dimensionality reduction: PCA

Dimensionality reduction: PCA

PCA

PCA

  • One useful application of PCA: visualizing results of K-means clustering performed on high-dimensional data.

PCA

Variations on PCA can also be used to infer similarities or differences between legislators or countries using roll-call votes.

  • Make a N x N matrix that counts how many times each member voted together

  • Calculate the Euclidian “distance” between each legislator

  • Use PCA on the distance matrix and take the first K dimensions

A B C D
A 1 1 1 0
B 1 1 0 0
C 0 0 1 0
D 1 1 1 1

PCA

For instance, here’s the result from scaling UN voting behavior from 2010 to 2019 and taking the first two components:

PCA

Poole and Rosenthal’s DW-Nominate scores use something similar to this approach

source

PCA with supervised models

Finally, PCA can be used as a pre-processing step for supervised models as a way to address the “curse of dimensionality” problem we talked about last class, just be sure to “train” the PCA model on the training data and then “predict” on the testing data, just like you would with a supervised model.

Try it out

  • Use scale to scale your features from the party cluster analysis.

  • Use prcomp to perform the PCA

  • Extract the first two principal components of the model

  • Plot your clusters using the PCA values, color-coded by cluster.